\section{Camlp4 Parser}

Camlp4's extensible parser is deeply combined with its own lexer, use
menhir if it is very complex and not ocaml-oriented. It is very hard
to debug in itself. So I suggest it is used to do simple
ocaml-oriented parsing.

\subsection{Examples}
Listing \ref{lst:camlp4 calc} is a simple calculator example using
camlp4 parser


\inputminted[fontsize=\scriptsize]{ocaml}{code/camlp4/arith/simple_arith.ml}
\captionof{listing}{Simple Calc Parser\label{lst:camlp4 calc}}



Some keywords for extensible paser

\begin{ocamlcode}
  EXTEND END  LIST0 LIST1 SEP TRY SELF OPT  FIRST LAST  LEVEL AFTER BEFORE
\end{ocamlcode}
\captionof{listing}{CAMLP4 KEYWORDS   \label{lst:camlp4 keywords}}


\textit{SELF} represents either the \textbf{current level},
\textbf{the next level} or the \textbf{ first level} depending on the
\textbf{ associativity} and the \textbf{position} of the \textit{SELF}
in the rule .

The identifier \textit{NEXT}, which is a call to the next level of the
current entry.

\subsection{Mechanism}
  There are four generally four phases
  \begin{enumerate}
  \item Collect new keywords, and update the lexer.
  \item represent the grammar as a tree data structure.
  \item left-factoring of each precedence level. there's a common
    perfix of symblos(a symbol is a keyword, token, or entry ), the
    parser does not branch until the common parser has been
    parsed. \textit{that's how grammars are implemented, first the
      corresponding tree is generated, then the parser is generated
      for the tree.}
  \item Greedy first. One rule is a prefix of another.  A token or
    keyword is \textit{preferred to epsilon, the empty string} (this
    also holds for other ways that a grammar can match epsilon )
    factoring happens when the parser is built .
  \item Explicit token or keyword \textit{trumps an entry}. So you
    have two prductions, \textit{with the same prefix, except the last
      one. one is another entry, and the other is a token}, The parser
    will \textit{first} try the token, if it succeeds, it stops,
    otherwise they try the entry. This sounds weird, but it is
    \textit{reasonable}, after left-factorization, the parser
    \textit{pays no cost} when it tries just a token, it's amazing
    that \textit{even more tokens}, the token rule \textit{still
      wins}, and even the token rule fails after consuming some
    tokens, it can \textit{even} transfer to the entry rule.  To
    notice, it seems that after factorization, the rule order
    \textit{is changed}. (Since they use peek here)
  \end{enumerate}

  Here are some experiments I did. Listing \ref{lst:Smart Camlp4
    Parser} is a example to show how smart camlp4 is.

  \inputminted[fontsize=\scriptsize]{ocaml}{code/camlp4/airth/second.ml}
  \captionof{listing}{Smart Camlp4 Parser \label{lst:Smart Camlp4
      Parser}}

\subsection{Parameterize Keywords}
\label{Parameterize Keywords}

   Here is another interesting example which parameterize keyword:

   \inputminted[fontsize=\scriptsize]{ocaml}{code/camlp4/airth/third.ml}
   \captionof{listing}{Parameterize Keyword \label{lst:Parameterize Keyword}}

   But the underlying mechanism is we add a filter plugin for the
   lexer, so to make it parse, we need a way to tell the parser that
   \textbf{B} is a keyword, we hint the compiler add a trailing
   \textbf{B}
   
  All the dispatch magic hides in \verb|Gram.extend| function (or
  \verb|Insert.extend| function)
  \href[]{camlp4/Camlp4/Struct/Grammar/Insert.ml}{}


  \begin{ocamlcode}
    value extend entry (position, rules) =
      let elev = levels_of_rules entry position rules in
      do {
        entry.edesc := Dlevels elev;
        entry.estart :=
          fun lev strm ->
            let f = Parser.start_parser_of_entry entry in
            do { entry.estart := f; f lev strm };
        entry.econtinue :=
          fun lev bp a strm ->
            let f = Parser.continue_parser_of_entry entry in
            do { entry.econtinue := f; f lev bp a strm }
      };
    \end{ocamlcode}
    \captionof{listing}{Gram Extend\label{Gram Extend}}



    Factoring \textit{only} happens in \textit{the same level within a
      rule}.

    You can do explicit backtracking by \textit{npeek trick} (Listing
    \ref{Camlp4 Explicit backtracking}), then you can switch to
    another branch by peeking some elements.


  \inputminted[fontsize=\scriptsize]{ocaml}{code/camlp4/explict_back_track/first.ml}
  \captionof{listing}{Explicit backtracking \label{Camlp4 Explicit backtracking}}



  Now we have a look at how left factorization is performed

  \begin{enumerate}
  \item Left Factorization \\
    Take rules as follows as an example
    
    \begin{ocamlcode}
      "method"; "private"; "virtual"; l = label; ":"; t = poly_type
      "method"; "virtual"; "private"; l = label; ":"; t = poly_type
      "method"; "virtual"; l = label; ":"; t = poly_type
      "method"; "private"; l = label; ":"; t = poly_type; "="; e = expr
      "method"; "private"; l = label; sb = fun_binding
      "method"; l = label; ":"; t = poly_type; "="; e = expr
      "method"; l = label; sb = fun_binding
    \end{ocamlcode}

    The rules are inserted in a tree and the result looks like:
    
\begin{ocamlcode}
  "method"
     |-- "private"
     |       |-- "virtual"
     |       |       |-- label
     |       |             |-- ":"
     |       |                  |-- poly_type
     |       |-- label
     |             |-- ":"
     |             |    |-- poly_type
     |             |            |-- ":="
     |             |                 |-- expr
     |             |-- fun_binding
     |-- "virtual"
     |       |-- "private"
     |       |       |-- label
     |       |             |-- ":"
     |       |                  |-- poly_type
     |       |-- label
     |             |-- ":"
     |                  |-- poly_type
     |-- label
           |-- ":"
           |    |-- poly_type
           |            |-- "="
           |                 |-- expr
           |-- fun_binding
      
    \end{ocamlcode}

This tree is built as long as rules are inserted.
\item \textit{start and continue}
  At each entry level, the rules are separated into \textbf{two
    trees}:
  \begin{enumerate}[(a)]
  \item The tree of the rules not starting with neither the current entry name
    nor by ``SELF''(start)
  \item The tree of the rules starting with the current entry or by
    SELF, this symbol \textbf{itself not being included} in the tree
  \end{enumerate}

  They determine two functions :
  \begin{enumerate}
  \item The function named {\color{red} ``start''}, analyzing the first tree
  \item The function named {\color{red} ``continue''}, taking, as parameter, a value
    previously parsed, and analyzing the second tree. 
  \end{enumerate}

  A call to an entry, correspond to a call to the \textbf{``start''} function of
  the \textbf{``first''} level of the entry.

  For the ``start'', it tries its tree, if it works, it calls the
  ``continue'' function of the same level, giving the result of ``start''
  as parameter. If this ``continue'' fails, return itself. (continue may
  do some more interesting stuff). If the ``start'' function fails, the
  ``start'' of the next level is tested until it fails. 


  For the ``continue'', it first tries the ``continue'' function of the
  \textbf{next} level. (here + give into *), if it fails or it's the
  last level, it then tries itself, giving the result as parameter. If
  it still fails, return its extra parameter.

  A special case for rules ending with SELF or the current entry
  name. For this last symbol, there's a call to the ``start'' function
  of \textbf{the current level (RIGHTA) or the next level (OTHERWISE)}

  When a SELF or the current entry name is encountered in the middle
  of the rule, there's a call to the start of the \textbf{first level} of the
  current entry.

  Each entry has a start and continue

\begin{ocamlcode}
(* list of symbols, possible empty *)
LIST0 : LIST0 rule | LIST0 [ <rule definition> -> <action> ]
(* with a separator *)
LIST0 : LIST0 rule SEP <symbol>
| LIST0 [<rule definition > -> <action>] SEP <symbol>
  LIST1 rule
| LIST1 [<rule definition > -> <action > ]
| LIST1 rule SEP <symbol>
| LIST1 [<rule definition > -> <action >] SEP <symbol>
OPT <symbol>
SELF
TRY (* backtracking *)
FIRST LAST LEVEL level, AFTER level, BEFORE level 
\end{ocamlcode}

\end{enumerate}

\subsection{Parsing OCaml using Camlp4}

\subsubsection{Fully Utilize Camlp4 Parser and Printers}

If we want to define our special syntax, we could do it like this
Listing \ref{lst:camlp4_parser_printer}.

\begin{listing}
  \inputminted[fontsize=\scriptsize,]
  {ocaml}{code/camlp4/parsing_ocaml/customsyntax.ml}
  \caption{Camlp4 Self Parser Printer}
  \label{lst:camlp4_parser_printer}
\end{listing}

Here we see we could get any parser, any printer we want, very convenient.
Notice \verb|Gram.Entry| is \textbf{ dynamic, extensible}

\subsubsection{Otags Mini}


\inputminted[fontsize=\scriptsize,linenos=true,stepnumber=5]{ocaml}
{code/camlp4/parsing_ocaml/fullparser.ml}
\captionof{listing}{Otags Mini \label{Otags mini}}

We use \textit{OCamlInitSyntax.Make} instead of \textit{MakeSyntax},
since it will introduce unnecessary abstraction type, which makes
sharing code difficult.

Actually, we can use camlp4 parser to parse interfaces as well

\begin{ocamlcode}
let sig =
  let str = eval ``moduleX = Camlp4.PreCast;;''
  and _loc = Loc.ghost in
  Stream.of_string str |> Syntax.parse_intef _loc 
\end{ocamlcode}




%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "../master"
%%% End: 
